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Constrained steepest descent 
in the 2-Wasserstein metric 

By E. A. Carlen and W. Gangbo* 



Abstract 

We study several constrained variational problems in the 2-Wasserstein 
metric for which the set of probability densities satisfying the constraint is 
not closed. For example, given a probability density Fq on M'^ and a time-step 
h > 0, we seek to minimize I{F) = hS{F) + W2{FQ, F) over all of the probabil- 
ity densities F that have the same mean and variance as Fq, where S{F) is the 
entropy of F. We prove existence of minimizers. We also analyze the induced 
geometry of the set of densities satisfying the constraint on the variance and 
means, and we determine all of the geodesies on it. From this, we determine 
a criterion for convexity of functionals in the induced geometry. It turns out, 
for example, that the entropy is uniformly strictly convex on the constrained 
manifold, though not uniformly convex without the constraint. The problems 
solved here arose in a study of a variational approach to constructing and 
studying solutions of the nonlinear kinetic Fokker-Planck equation, which is 
briefly described here and fully developed in a companion paper. 
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1. Introduction 



Recently there has been considerable progress in understanding a wide 
range of dissipative evolution equations in terms of variational problems in- 
volving the Wasserstein metric. In particular, Jordan, Kinderlehrer and Otto, 
have shown in [12] that the heat equation is gradient flow for the entropy func- 
tional in the 2- Wasserstein metric. We can arrive most rapidly to the point of 
departure for our own problem, which concerns constrained gradient flow, by 
reviewing this result. 

Let V denote the set of probability densities on with finite second 
moments; i.e., the set of all nonnegative measurable functions F on such 
that /]gd F{y)A.v = 1 and /jjd \v\^F{v)dLV < oo. We use v and w to denote points 
in since in the problem to be described below they represent velocities. 
Equip V with the 2- Wasserstein metric, W2{Fo, Fi), where 

(1.1) Wi{Fo,Fi)= inf / l-\v-wf-f{dv,dw) . 

■yeC{Fo,Fi) Jm'^xR''- 2 

Here, C{Fq, Fi) consists of all couplings of Fq and Fi; i.e., all probability mea- 
sures 7 on R'^ X M.^ such that for all test functions rj on 



ri{v)j{dv,dw) = / ri{v)Fo{v)dv 



'XJl 



and 



/ r]{w)'j{dv,dw) = / 'r){w)Fi{w)dv . 



The infimum in (1.1) is actually a minimum, and it is attained at a unique 
point 7F(),Fi in C{Fo,Fi). Brcnicr [3] was able to characterize this unique 
minimizer, and then further results of Caffarelli [4], Gangbo [10] and McCann 
[16] shed considerable light on the nature of this minimizer. 

Next, let the entropy S{F) be defined by 

(1.2) S{F)= [ F{v)lnF{v)dv . 

jRd. 

This is well defined, with oo as a possible value, since J^d \v\^F{v)dv < oo. 

The following scheme for solving the linear heat equation was introduced 
in [12]: Fix an initial density Fq with J^d \ v\'^ Fo{v)dv finite, and also fix a time 
step h > 0. Then inductively define Fk in terms of -Ffc-i by choosing F^ to 
minimize the functional 

(1.3) F~^[wi{Fk-i,F) + hS{F) 

on V. It is shown in [12] that there is a unique minimizer Fj. G V, so that each 
Fk is well defined. Then the time-dependent probability density F^^\v,t) is 
defined by putting F^^\v, kh) = F^ and interpolating when t is not an integral 
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multiple of h. Finally, it is shown that for each t F{-,t) = liuih^Q F^'^\- , t) 
exists weakly in , and that the resulting time-dependent probability density 
solves the heat equation d/dtF{v,t) = AF{v,t) with lim^^o -^('j = Fq. 

This variational approach is particularly useful when the functional being 
minimized with each time step is convex in the geometry associated to the 
2-Wasserstein metric. It makes sense to speak of convexity in this context 
since, as McCann showed [16], when V is equipped with the 2-Wasserstein 
metric, every pair of elements Fq and Fi is connected by a unique continuous 
path Ft,0<t<l, such that W2{Fo, Ft) + W2{Ft, Fi) = ^2(^0, Fi) for all 
such t. It is natural to refer to this path as the geodesic connecting Fq and Fi, 
and we shall do so. A functional $ on 7^ is displacement convex in McCann's 
sense if t ^{Ff) is convex on [0, 1] for every Fq and -Fi in V- It turns out 
that the entropy S{F) is a convex function of F in this sense. 

Gradient flows of convex functions in Euclidean space are well known to 
have strong contractive properties, and Otto [18] showed that the same is true 
in V, and applied this to obtain strong new results on rate of relaxation of 
certain solutions of the porous medium equation. 

Our aim is to extend this line of analysis to a range of problems that are 
not purely dissipative, but which also satisfy certain conservation laws. An 
important example of such an evolution is given by the Boltzmann equation 



where for each t, /(•, is a probability density on the phase space A x M*^ 
of a molecule in a region A C M'', and Q is a nonlinear operator representing 
the effects of collisions to the evolution of molecular velocities. This evolution 
is dissipative and decreases the entropy while formally conserving the energy 
/axR'' ^) t)dxdv and the momentum /^xK^ "^/(^j ^) t)dxdv. A good deal 

is known about this equation [7], but there is not yet an existence theorem for 
solutions that conserve the energy, nor is there any general uniqueness result. 

The investigation in this paper arose in the study of a related equation, the 
nonlinear kinetic Fokker-Planck equation to which we have applied an analog 
of the scheme in [12] to the evolution of the conditional probability densities 
F{v;x) for the velocities of the molecules at x; i.e., for the contributions of 
the collisions to the evolution of the distribution of velocities of particles in a 
gas. These collisions arc supposed to conserve both the "bulk velocity" u and 
"temperature" 9, of the distribution where 



-fix, V, t)+V^- {vf{x, V, t)) = Q if) (x, V, t) 



(1.4) 




and 
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For this reason we add a constraint to the variational problem in [12]. Let 

u eW'' and 9 > he given. Define the subset SugofV specified by 

(1.5) 

£ue = \ f eV \ ! \v- u\^F{v)dv = 6* and / vF{v)dv = u \ . 

This is the set of all probability densities with a mean u and a variance dO, 
and we use £ to denote it because the constraint on the variance is interpreted 
as an internal energy constraint in the context discussed above. 
Then given Fq G £u,e, define the functional I{F) on £ufi by 

>|(Fo,F) 



(1.6) 1(F) 



6 



+ hS{F) 



Our main goal is to study the minimization problem associated with determin- 
ing 

(1.7) inf{/(F) I FG^„,,} . 

Note that this problem is scale invariant in that if Fq is rescaled, the minimizer 
F will be rescaled in the same way, and in any case, this normalization, with 
9 in the denominator, is dimensionally natural. 

Since the constraint is not weakly closed, existence of minimizers does not 
follow as easily as in the unconstrained case. The same difficulty arises in the 
determination of the geodesies in £u.e- 

We build on previous work on the geometry of V in the 2-Wasserstein 
metric, and Section 2 contains a brief exposition of the relevant results. While 
this section is largely review, several of the simple proofs given here do not 
seem to be in the literature, and are more readily adapted to the constrained 
setting. 

In Section 3, we analyze the geometry of £, and determine its geodesies. 
As mentioned above, since £ is not weakly closed, direct methods do not yield 
the geodesies. The characterization of the geodesies is quite explicit, and from 
it we deduce a criterion for convexity in £, and show that the entropy is 
uniformly strictly convex, in contrast with the unconstrained case. 

In Section 4, we turn to the variational problem (1.7), and determine the 
Euler-Lagrange equation associated with it, and several consequences of the 
Euler-Lagrange equation. 

In Section 5 we introduce a variational problem that is dual to (1.7), and 
by analyzing it, we produce a minimizer for I{F). We conclude the paper in 
Section 6 by discussing some open problems and possible applications. 

We would like to thank Robert McCann and Ccdric Villani for many 
enlightening discussions on the subject of mass transport. We would also like 
to thank the referee, whose questions and suggestions have lead us to clarify 
the exposition significantly. 
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2. Riemannian geometry of the 2-Wasserstein metric 

The purpose of this section is to collect a number of facts concerning the 
2-Wasserstein metric and its associated Riemannian geometry. The Rieman- 
nian point of view has been developed by several authors, prominently includ- 
ing McCann, Otto, and Villani. Though for the most part the facts presented 
in this section are known, there is no single convenient reference for all of them. 
Moreover, it seems that some of the proofs and formulae that we use do not 
appear elsewhere in the literature. 

We begin by recalling the identification of the geodesies in V equipped 
with the 2-Wasserstein metric. The fundamental facts from which we start 
are these: The infimum in (1.1) is actually a minimum, and it is attained at 
a unique point 7Fo,Fi in C{Fo^Fi)-, and this measure is such that there exists 
a pair of dual convex functions and il) such that for all bounded measurable 
functions on R'^ x R'^, 

(2.1) / r]{v,w)'^Fo,FA'^'"^'^w) = / 'q{v,V(t){v))FQd,v 

= / r]{'Vtl^{w),w)Fidw . 
In particular, for all bounded measurable functions 77 on R*^, 

(2.2) / r](y(p{v))FQdv = [ r]{w)Fidw , 

and V0 is the unique gradient of a convex function defined on the convex hull 
of the support of Fq so that (2.2) holds for all such rj. 

Recall that for any convex function 1/; on R*^, ip* denotes its Legendre 
transform; i.e., the dual convex function, which is defined through 

(2.3) ■ip*{w) = sup { w • V — ipiv) } . 

veR'^ 

The convex functions ■0 arising as optimizers in (2.1) have the further property 

that (tp*)* = ip. Being convex, both and ip* are locally Lipschitz and diffcr- 
entiable on the complement of a set of Hausdorff dimension d — 1. (It is for 
this reason that we work with densities instead of measures; Vip#iJ, might not 
be well defined if /j, charged sets Hausdorff dimension d—1.) In our quotation 
of Brenier's result concerning in (2.1), the statement that the convex functions 
ijj and (j) in (2.1) are a dual pair simply means that (p = ip* and ip = <p*. It 
follows from (2.3) that Vip and Vip* are inverse transformations in that 

(2.4) VipiVip^w)) =w and Vip* {Vip{v)) = v 

for Fi{w)dw almost every w and Fo{v)dv almost every v respectively. 
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Given a map T : M'^ ^ M'^ and F G V, define r#F G P by 
/ r){v){T#F{v))dv = [ r){T{v))F{v)dv 

for all test functions 77 on W^. Then we can express (2.2) more briefly by writing 
V0#Fo = Fi. The uniqueness of the gradient of the convex potential is very 
useful for computing VFf {Fq , Fi ) since if one can find some convex function (j) 
such that Vcpif^Fo = Fi, then (p is the potential for the minimizing map and 

(2.5) Wi{Fo,Fi)=[ hv-V^{v)fFo{v)dv . 

Now it is easy to determine the geodesies. These are given in terms of 
a natural interpolation between two densities Fq and Fi that was introduced 
and applied by McCann in his thesis [15] and in [16]. 

Fix two densities Fq and -Fi in V. Let be the convex function on M'^ 
such that (VV') #Fo = Fi. Then for any t with < t < 1, define the convex 
function ipt by 

(2.6) ^^(^) = (i_t)M! + i^(^) 

and define the density Ft by 

(2.7) Ft = V^t#Fo . 

At t = 0, V^t is the identity, while at t = 1, it is V-^. 

Clearly for each < t < 1, -0^ is convex, and so the map Vipt gives the 
optimal transport from Fq to Ft. What map gives the optimal transport from 
Ft onto Fi? 

By definition VV't#Fo = Ft. It follows from (2.4) that V(^t)*#Ft = Fq, 
and therefore that V'ljj o \/{tJjt)*#Ft = Fi. It turns out that Vip o V(V^f )* is the 
optimal transport from Ft onto Fi. This composition property of the optimal 
transport maps along a McCann interpolation path provides the key to several 
of the theorems in the next section, and is the basis of short proofs of other 
known results. It is the essential observation made in this section. 

To sec that Vip o V(V't)* is the optimal transport map from Ft onto Fi, 
it suffices to show that it is a convex function. From (2.6), Viptiv) = (1 — t)v 
+ tVV'(f), which is the same as tVtp{v) = (Viptiv) - (1 - t)v). Then by (2.4), 

(2.8) VV' o Vii^triw) = ^{w-{l- t)V{^tr{w)) . 

Thus, VV' o V(V't)*(W'') is a gradient. There arc at least two ways to proceed 
from here. Assuming sufficient regularity of ip and ip* , one can differentiate 
(2.4) and see that Hess'i/'(V'i/'*(w^))Hess'i/'*('w) = I. That is, the Hessians of ^ 
and ^* are inverse to one another. Since HessV't(i') > (1 — t)I, this provides 
an upper bound on the Hessian of (V't)* which can be used to show that the 
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right side of (2.8) is the gradient of a convex function. This can be made 
rigorous in our setting, but the argument is somewhat technical, and involves 
the definition of the Hessian in the sense of Alexandroff. 

There is a much simpler way to proceed. As McCann showed [15], if Ft 
is the path one gets interpolating between Fq and Fi but starting at Fi, then 
Ft = Fi-f So V {{ip*)i-t)* is the optimal transport map from Ft onto Fi. This 
tells us which convex function should have V^o V(V't)*(w^) as its gradient, and 
this is easily checked using the mini-max theorem. 

Lemma 2.1 (Interpolation and Lcgcndre transforms). Let tp be a convex 
function such that ip = ip** . Then by the interpolation in (2.6), 

(2.9) {{r)i-tr H = i - (1 - tx^tTH^ ■ 

Proof. Calculating, with use of the the mini-max theorem, one has 
{{rh-tTH = sup|z-«;-(^tM! + (i_t)^*(^)j| 

= sup < z ■ w — t— (1 — i) sup {v ■ z — ip{v)} > 

2 I 2 V J 



supinf iz-{w-{l- t)v) - + (1 - t)'ip{v) 
2 " 2 



f UP 
inf sup {z-{w-{l- t)v) - + (1 - t)ilj{v) 
"2 2 



1 / IwP 



t \ 2 



-(l-t)(V'0*(^) ■ □ 



As an immediate consequence, 

(2.10) V((V'*)i-t)* = VV'oV(V't)* 

is the optimal transport from Ft to Fi. This also implies that Vipti^Po = 
V(^*)i-t#-Fi, as shown by McCann in [15] using a "cyclic monotonicity" ar- 
gument. Lemma 2.1 leads to a simple proof of another result of McCann, again 
from [15]: 

Theorem 2.2 (Geodesies for the 2-Wasserstein metric). Fix two densities 
Fq and Fi in V. Let ip be the convex function on R.'^ such that (Vip) #Fq = F^. 
Then for any t with < t < 1, define the convex function ipt by (2.6) and define 
the density Ft by (2.7). Then for allO <t <1, 

(2.11) W2{Fo,Ft) = tW2{Fo,Fi) and W2{Ft, Fi) = {1 - t)W2{Fo, Fi) 
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and t ^ Ft is the unique path from Fq to Fi for the 2- Wasserstein met- 
ric that has this property. In particular, there is exactly one geodesic for the 
2- Wasserstein metric connecting any two densities in V. 

Proof. It follows from (2.5) that 



Next, since V is the optimal transport from Ft to Fi, by (2.9), 



Together, the last two computations give us (2.11). 

The uniqueness follows from a strict convexity property of the distance: 
For any probability density Go, the function G i— Wf (Go, G) is strictly convex 
on V in that for any pair Gi, G2 in V and any t with <t < 1, 



(2.12) WiiGo, (1 - t)Gi + tG2) < (1 - t)Wi{Go, Gi) + tWi{Go, G2) 



and there is equality if and only if Gi = G2. This follows easily from the 
uniqueness of the optimal coupling specified in (2.1); nontrivial convex com- 
binations of such couplings are not of the form (2.1), and therefore cannot be 
optimal. 

Now suppose that there are two geodesies t ^ Ft and t^ Ft. Pick some io 
with Ftg 7^ -Fig. Then the path consisting of a geodesic from Fq to {Ft^ +FtQ)/2, 
and from there onto Fi would have a strictly shorter length than the geodesic 
from Fq to Fi, which cannot be. □ 

To obtain an Eulerian description of these geodesies, let / be any smooth 
function on W^, and compute: 







(2.13) 





Vf{w) [Vii/JtTiw) - VV(V(Vt)*(«;))] Ft{w)dw 
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In other words, when Ft is defined in terms of Fq and il) as in (2.6) and (2.7), 
Ft is a weak solution to 

(2.14) ^^Ft{w) + V ■{W{w,t)Ft{w)) = Q 
where, according to Lemma 2.1, 

(2.15) w(^,t) = = V (M! - iw)- w) . 

In light of the first two equahties in (2.13), 

(2.16) W{w,{)) = V -i){w)\ =w -Vi^iw) . 



This gradient vector field can be viewed as giving the "tangent direction" to 
the geodesic t Ft t = {). 

We would like to identify some subspacc of the space of gradient vector 
fields as the tangent space Tp^ to V at Fq. Towards this end we ask: Given a 
smooth, rapidly decaying function 77 on M'^, is there a geodesic t ^ Ft passing 
through at i = so that, in the weak sense. 



(2.17) (|Ft + V-(V77i^t; 



= 

t=o 



The next theorem says that this is the case, and provides us with a geodesic 
that (2.17) holds with 77 sufficiently small. But then by changing the time 
parametrization, we obtain a geodesic, possibly quite short, that has any mul- 
tiple of Vt? as its initial "tangent vector" . 

Theorem 2.3 (Tangents to geodesies). Let r] be any smooth, rapidly 
decaying function rj on M'^ such that for all v, 

(2.18) ^^y) = \^+rj{v) 

is strictly convex. For any density Fq in V, and t with <t <1, define 

(2.19) VMv) = (1 - t)v + tV7p{v) =v + tVr]{v) . 

Then for all t with < t < 1, Ft = Vipti^Fo is absolutely continuous, and is a 
weak solution of 

(2.20) -Ftiv) + V • {Vvtiv)Ftiv)) = , 



where 



1 / IwP 



(2.21) ^,{v) = -lL^-(^ty{v) 
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Moreover, 



(2.22) Vr]t{v) = Vv{v) - lv\Vr,{v)f + t^VRt{v) , 

where the remainder term VRt{v) satisfies ||V-Rt||oo < l|Hess (77)!!^ uniformly 
in t. 

Proof. First, the fact that Vipt#Fo is absolutely continuous follows from 
the fact that Vi'tjjt)* is Lipschitz. Formulas (2.20) and (2.21) follow directly 
from (2.14) and (2.15). 

To obtain (2.22), use (2.4) to see that V{iptT{v) = $(V(V'^)*(^^)) where 
^{w) = V — tVr]{w). Iterating this fixed point equation three times yields 

(2.22) . □ 

In light of Theorems 2.2 and 2.3, we now know that every geodesic t ^ Ft 
through Fq at t = satisfies (2.17), and conversely, for every smooth rapidly 
decaying gradient vector field, there is a geodesic t ^ Ft through Fq at t = 
satisfying (2.17) for that function r]. Moreover, along this geodesic 

(2.23) Wl{Fo,Ft) = n [ \Vvs{v)\''F,{v)dv) ds = t [ \Vviv)\''Fo{v)dv , 

where r]^ is related to 77 as in Theorem 2.3. 

Furthermore if t 1-^ is a path in V satisfying (2.17) for some gradient 
vector field Vr/, then this vector field is unique. For suppose that t Ft also 
satisfies 

(2.24) (^Fi + V • {Vm)^ t-o^^' 

Then, V • {V{r] — O-^o) = 0- Integrating against rj — ^, we obtain that 

/ \Vr]-V^\^Fo{v)dv = . 

Careful consideration of this well-known argument, inserting a cut-off function 
before integrating by parts, reveals that all it requires is that both Vrj and 
are square integrable with respect to Fq. This justifies the identification of the 
tangent vector dF/dt with V77 when (2.17) holds and Vrj is square integrable 
with respect to Fq. 

This identifies the "tangent vector" dFt/dt with Vrj, and gives us the 
Riemannian metric, first introduced by Otto [18], 

(2.25) g(^^,^^ = ^J \Vr,{v)fFo{v)dv . 

By (2.23), the distance on V induced by this metric is the 2-Wasserstein dis- 
tance. 
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Interestingly, Theorem 2.2 provides a global description of the geodesies 
without having to first determine and study the Riemannian metric. Theo- 
rem 2.3 gives an Eulerian characterization of the geodesies which provides a 
complement to McCann's original Lagrangian characterization. Another Eule- 
rian analysis of the geodesies in terms of the Hamilton-Jacobi equation seems 
to be folklore in the subject. A clear account can be found in recent lecture 
notes of Villani [22]. 

We now turn to the notion of convexity on V with respect to the 
2-Wasscrstcin metric. A functional $ on P is said to be displacement con- 
vex at Fq in case t i— > is convex on some neighborhood of for all 
geodesies Ft passing through at t = 0. A functional $ on 7^ is said to 
be displacement convex if it is displacement convex at all points Fq of V. 

If moreover t ^ ^{Ft) is twice differentiable, we can check for displace- 
ment convexity by computing the Hessian: 

(2.26) Hess $(Fo)(Vr/, Vr/) = ^^{Ft, 



t=o 



where Vr/ is the tangent to the geodesic at t = 0. 



Theorem 2.4 (Displacement convexity). // the functional ^ on V is 
given by 

(2.27) $(F) = / g{F{v))dv 

where g is a twice differentiable convex function on M+, then $ is displacement 
convex if 

(2.28) tg'{t) - g{t) > and t'^g"{t) - tg'{t) + g{t) > 
for all t > 0, where the primes denote derivatives. 



Proof. We check for convexity at a density Fq in the domain of By a 
standard mollification, we can find a sequence of smooth densities Fq"^ with 

lim,„^oo ^o"^ = -^0 and lim $(F(|")) = $(Fo). Fix any smooth rapidly 
decaying function rj, such that (taking a small multiple if need be) -|- r]{v) 
is strictly convex. Then with Vipt defined as in (2.19), 

t ^ V^t^F^""^ = F^^^ 

gives a geodesic passing through F^^ at t = with the tangent direction Vr/, 
and defined for < t < 1 uniformly in 71. Also, HiHt^— >oo 
for all such t. Therefore, it suffices to show that for each n, t $(_F^^"^) is 
convex. In other words, we may assume that Fq is smooth. Then so is each Ff, 
since Ft{w) = Fo(y{'ipt)* iw))det (Hess {ipt)*)iw)) is a composition of smooth 
functions. We may now check convexity by differentiating. 
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By (2.20), 

^/ g{Ft{v))dv = - [ g'{Ft{v))V-{Vvt{v)Ft{v))dv 

= f {g"{Ft{v))VFt{v))-(Vr]t{v)Ft{v))dv. 
Defining h{t) = tg'{t) - g{t) so that h'{t) = tg"{t), one has from (2.20) that 
(2.29) ^MFt) = / ^h{Ft{v)) ■ Vrit{v)dv . 

dt jRd 

To differentiate a second time, use (2.22) to obtain 



d2 



^^=/_^VMF„).v(-l|v,p)d„-/^^|MF,: 



{Arj) dv . 



t=o 



But 



and hence 



d_ 

di 



h{Ft, 



t=o 



~Fo'g"{Fo){A7])-VhiFo)-Vv 



(2.30) 

d^ 

dt2 



HFt) 



t=o 



Vh{Fo) • (-^V|Vr/|2 + (Ar/) Vr?) dv + J^^ Fig"{Fo) (Ar/)' d^; 
= / hiFomessrifdv + [ (Fo'/(^o) - HFo)) (Arjfdv . 

JRd jRd \ / 

Here, ||Hess?7|p denotes the square of the Hilbert-Schmidt norm of the Hessian 
of rj. This quantity is positive whenever h{F) = Fg'{F) — g{F) and F^g"{F) — 



h{F) = F^g"{F) - Fg'{F) + g{F) are positive. 



□ 



The case of greatest interest here is the entropy functional S{F), defined 
in (1.2). In this case, g{t) = tint, so th&t tg' (t) - g{t) = t and tg" (t) - tg' (t) + 
g{t) = 0. Hence from (2.30), 

d^ r 

' / ||Hess?7||^Fo('y)dt^ • 



(2.31) 



di2 



S{Ft. 



This shows that the entropy is convex, as proved in [18], though not strictly 
convex. Consider the following example^ in one dimension: Let 



V'W = ^ + \v\- 



^We thank the referee for this example, which has clarified the formulation of Corollary 2.5 
below. 
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For any Fq, define Ft = V^^ and then it is easy to see that 

(2.32) Ft{v) = l|,<_t|Fo(t; + t) + l{,>t}Fo(t; - t) . 

The geodesic t ^ Ft can be continued indefinitely for positive t, but unless Fq 
vanishes in some strip —e<v<e,li cannot be continued at all for negative 
t. With Ft defined as in (2.32), S{Ft) = S{Fo) for all t. 

There are however interesting cases in which the entropy is strictly convex 
along a geodesic, and even uniformly so: Suppose that the "center of mass" 
J^d vFt{v)dv is constant along the geodesic t ^ Ft, which means that 

(2.33) / Vr/(v)Fo(v)du = 

where as above, V?? is the tangent vector generating the geodesic. 
The Poincare constant a{F) of a density F in 7^ is defined by 

(2.34) aiF) = inf JlV^WP^Wd^ . 

Thus, when (2.33) holds, with (p = drj/dvi for z = 1 . . . d we take the sum, 
yielding 

(2.35) / \\llessr]fFo{v)dv>a{Fo)f \Vri{v)fFo{v)dv , 

which provides a lower bound to the right side of (2.31) in terms of the Rie- 
mannian metric. 

Now consider a "smooth" geodesic through a smooth density Fq, as in the 
previous proof, and such that (2.33) is satisfied. Then by (2.31) and (2.35), 
for any t and h > such that Ft-h and Ft^h are both on the geodesic, 

^{SiFt+h) + SiFt.h)-2SiFt))>a{Ft) [ \Vr,iv)\^Foiv)dv . 

If the geodesic is parametrized by arclength, then the last factor on the right 
is one. 

Summarizing the last paragraphs, we have the following corollary: 

Corollary 2.5 (Strict convexity of entropy). Consider a geodesic s i-^ 
Fs parametrized by arc length s, and defined for some interval a < s < b 
such that s I— > / vFs{v)dv is constant, and such that each Fs is bounded and 
continuously differentiable. Then for all s and h so that a<s — h,s + h<b, 

(2.36) S{Fs+h) + S{Fs-h) - 2S{Fs) > h^Fs) , 
where a{Fs) is the Poincare constant of the density Fg. 

(Notice that for the geodesic (2.32), a{Ft) = for all t > 0, as long as Fq 
has positive mass on both sides of the origin, in addition to the fact that Ft 
will not in general be smooth.) 
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We remark that Caffarelli has recently shown [6] that if Fq is a Gaussian 
density, and Fi = e~^Fo where V is convex, then there is an upper bound 
on the Hessian of the potential tp for which Vi/i^-Fo = Fi. This upper bound 
is inherited by for all t. Since as Caffarelli shows, an upper bound on the 
Hessian of and a lower bound on the Poincare constant for Fq imply a lower 
bound on the Poincare constant of Ft, one obtains a uniform lower bound on 
the Poincare constant for Ft, < t < 1. Hence S{Ft) is uniformly strictly 
convex along such a geodesic. 



3. Geometry of the constraint manifold 

Let u and 6 > Ohe given. Consider the subset of V specified by 



(3.1) 



FeV 



- [ \v — u\'^F(v)dv = 9 and / vF{y)A 



V = u 



This is the set of all probability densities with a mean u and a variance dO. 
We will often write £ in place of £u,0 when u and 9 are clear from the context 
or simply irrelevant. 

We give a fairly complete description of the geometry of £, both locally 
and globally. In particular, we obtain a closed form expression for the distance 
between any two points on £ in the metric induced by the 2-Wasserstein metric, 
and a global description of the geodesies in £. 

Notice that 

(3.2) Su,eC^F\WiiF,6u) = f^ 

where 5„ is the unit mass at u. This is quite clear from the transport point of 
view: If our target distribution is a point mass, there are no choices to make; 
everything is simply transported to the point u. Hence £^,9 is a part of a sphere 
in the 2-Wasserstein metric, centered on 6u, and with a radius of \/ d9 jl. 

Our first theorem shows that for any Fq in 'P, there is a unique closest F 
in £, and this is obtained by dilatation and translation. This is the first of two 
related variational problems solved in this section. 

Theorem 3.1 (Projection onto £^. Let Fq he any probability density on 
such that 

/ vFQ{v)dv = uo and / \v — uo\'^ Fo{v)dv = d9o . 

Let 9 > and u be given, and set a = \/9o/0. Then 

inf{w|(G,Fo) \Ge£e,u} 
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is attained at 

F{v) = a'^Fo {a{v - u) + uq) , 
and the minimum value is 

, 2 



(3.3) (F„, F) = ^ ^—^ + . 

Proof. There is no loss of generality in fixing ti = in the proof since if 
Uq is arbitrary, a translation of both F and Fq yields the general result. 

Let (p be defined by (p{v) = \v - uo\^/{2a) so that (V0) #Fo = F. Let 
i/j{w) = a\w\'^/2 + w ■ Uq he the dual convex function so that 

(p{v) + ijj{w) > V ■ w , 

and hence 

1 2 ^ - |i; - Mop , {I - a)\w\'^ - w ■ Uo 

(3.4) -\v-w\ > + 

for all V and w. 

Next, given any G in 6, let 7 be the optimal coupling of Fo and G so that 

Wi{Fo,G)= f l\v -w\'^jidv,dw) . 

Then by (3.4), 

(g - ifde \uol_ 
2 ■ 

On the other hand, since (V(/)) #Fo = F, 

Wi{Fo,F) = [ l\v-V(l)iv)\^Fo{v)dv 

= (--lY f hv-no\'Foiv)dv+^-^ 

(a-l)2de |uoP 
= 2 + ° 

Remark (Exact solution for the JKO time discretization of the heat equa- 
tion for Gaussian initial data). Theorem 3.1 allows us to solve exactly the 
Jordan-Kinderlehrer-Otto time discretization of the heat equation for Gaus- 
sian initial data. Take as initial data Fo{v) = (47rto)~'''^^e~l^l We can now 
find mi{W2{F,Fo) + hS{F)} in two steps. First, consider 

(3.5) inf{Wf (F, Fo) + hS{F) \ F e So,2td}. 
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Now on £o^2td, S has a global minimum at Gt = (47ri)~'^/^e~l''l^/*^*, as is well 
known. By Theorem 3.1, WK-F, -Fb) also has a global minimum on fo,2td at 
Gt, since Gt is just a rescaling of Fq. Therefore, by (3.3), the infimum in (3.5) 
is 

Wi{Gt,Fo) + hS{Gt) = d[Vi-Vt^y + -h^ (ln(47rt) + 1) . 

In the second step, we simply compute the minimizing value of t, which 
amounts to finding the value of t that minimizes 

Simple computations lead to the value t = f{to) where 

1 



(3.6) 



Note that to < f{to) < to + h, but f{to) = to + h + 0{h?). If we then 
inductively define t„ = f{tn-i), we see that the exact solution of the Jordan- 
Kinderlchrcr-Otto time discretization of the heat equation is given at time 
step n by i^n = (47rtn)"''''^e-l''l'/**" where tn = to + nh + 0{h?). Note that 
in the discrete time approximation, the variance increases more slowly than 
in continuous time, since the 0{h'^) term is negative, though of course the 
difference in the rates vanishes as h tends to zero. 

Returning to the main focus of this section, fix two densities Fo and Fi 
in £. Let ip be the convex function on such that (V'0) #Fo = F^. Then by 
Theorem 2.2, the geodesic that runs from Fq to F\ through the ambient space 
V is given by 

Ft = ((1 - t)v + tV^) #Fo . 

Thinking of £^ as a subset of a sphere, and this geodesic as the chord connecting 
two points on the sphere, we refer to it as the chordal geodesic Fo to Fi. 

Lemma 3.2 (Variance along a chordal geodesic). Let Fo and Fi be any 

two densities in £ . Let t^ Ft he the chordal geodesic joining them. Then for 
all t with < i < 1, 



(3.7) 



U\ rtlVjCLV = — 



l-4t(l 



^^Wi{Fo,Fi) 



l-t(l-t) 



2de 

Wi{Fo,Fi) 
Rl 



where Rg = ^dBjl. 
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Proof. Notice first that with Fi = VV'#-^0) we have from Theorem 2.2 



that 

(3.8) / l\v-u\^Ft{v)dv 

= / -\{{l-t)v + tVip{v)) -ufFo{v)dv 
jRrf 2 

= / -\{l-t){v-u) + t{V'iljiv)-u)\'^Fo{v)dv 

JRd- 2 

= {l-tf [ \\v-u\^Fo{v)dv + t^ I ]-\w-u\^Fi{y)dv 
J'S.'i 2 JKd 2 

+t{l -t) I {v-u)- (VV'(w) - u) FQ{v)dv 

rift dO f 

= ^(1 - 0' + IT*' + i(l -t) {v-u)- (WW - u) F^{v)dv. 

2 2 J]Rd 

Next, 

W^|(Fo,Fi) = ^ / V^(t;)|2Fo(t>)dt; 

= o / - upFo(v)d?; + - / iVV'('y) - 'up-Fo(t;)d?; 

2 jRd 2 jRd 

— / (w — u) • (VV'(i') — u) FQ{y)dv 
Jm.'I- 

= dO-i {v-u)-{Vtl){v)-u)FQ{v)dv 
by the definition of £, and hence 

(3.9) / {v-u)- {V^{v) - u) FQ{v)dv = d9- W^iFo, Fi) . 
Combining (3.9) and (3.8), one has the result. □ 

We note that since J-^d{v — u)Fo{v)dv = 0, 
/ (v-u)- {Vi/;{v) - u) Fo{v)dv = [ (v-u)- (Vipiv) - Vi/;{u)) Fo{v)dv > 

JRd JRd. 

by the convexity of ip. It fohows from this and (3.9) that 

(3.10) Wi{Fo,Fi)<de = 2Rl , 



where Rq = ^/d9/2 is the radius of £ as in (3.2). Hence the variance in (3.7) 
is never smaller than Rg. 

The next result is the second of the variational problems solved in this 
section, and is the key to the determination of the geodesies in £. 

Theorem 3.3 (Midpoint theorem). Let Fq and Fi be any two densities 
in £. Then 



(3.11) M{wi{Fo,G) + Wi{G,F,)} 
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is attained uniquely at a'^Fii2{a{v — u) -\- u) where F-yji is the midpoint of the 
chordal geodesic, and a is chosen to rescale the midpoint onto S; i.e., 



ni2l n-^h ^2{Fo,F,) _ WjiFp^F,) 



where Re = \Jd9 /2 is the radius of £ as in (3.2). Moreover, the minimal value 
attained in (3.11) is / (W2{Fq,Fi)) where 



(3.13) f{x) = 2de[l- sjl- x/{2de) 

The function f is convex and increasing on [0, 2d6] . 

Before giving the proof itself, we first consider some formal arguments 
that serve to identify the minimizer and motivate the proof. 

Let ^>(G) denote the functional being minimized in (3.11). This functional 
is strictly convex with respect to the usual convex structure on £; that is, for 
all A with < A < 1, and all Go and Gi in £, 

$(AGo + (1 - A)Gi) < A$(Go) + (1 - A)$(Gi) 

with equality only if Go = Gi. The strict convexity suggests that there is a 
minimizer Go, and that if we can find any critical point G of then G is the 
minimizer Go. 

To make variations in G, seeking a critical point, let 77 be a smooth, 
rapidly decaying function on M'^, and define the map Tt : — > M"^ by Tt[v) = 
V + tVri{v). Let Gj = T^^Go. We want the curve t Gt to be tangent to £ 
at t = 0, and so we require in particular that 

(3.14) / V ■ Vr]{v)Go{v)dv = 

which guarantees that / \v\'^G{t)dv = J \v\'^Godv + C(t^). 

Let 4> be the convex function such that V^#Go = Fq, and let (f) be the con- 
vex function such that V(/>#Go = Fi. The variation in $(Gf) can be expressed 
in terms of (p, (p and rj as follows: Formally, assuming enough regularity, we 
have 

(3.15) lim ^C^^) - ^(<^o) ^ / /y^^y) + v^(^) _ 2^) . \/r,{v)Go{v)dv . 

t^0+ t jRd ^ ^ 

(A more precise statement and explanation are provided in Section 4 where 
we make actual use of such variations. For the present heuristic purposes it 
suffices to be formal.) 
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Combining (3.14) and (3.15), we see that the formal condition for Go to 
be a critical point is 

(3.16) V(j){v) + V4>{v) =Cv 

for some constant C. 

The formal argument tells us what to look for, namely a Go such that 
(3.16) holds. It is easy to sec, if Gq is the midpoint of the chordal geodesic 
from Fq to Fi projected onto £ by rcscaling as in Theorem 3.1, that Go satisfies 

(3.16) . The actual proof of the theorem consists of two steps: First we verify 
the assertion just made about Go so defined. Then we prove, using (3.16), that 
Go is indeed the minimizer using a duality argument very much like the one 
used to prove Theorem 3.1. 

Proof of Theorem 3.3. First, we may assume that u = 0. Next, let ip be the 
convex function such that V^#i<o = Fi. We may suppose initially that both 
Fq and Fi arc strictly positive so that will be convex on all of M'^. Recall that 
v(V'i/2)*#i^i/2 = Fo, and that by (2.10), V ((V*)i/2)* #i^i/2 = i^i- Then 
immediately from (2.9) we have 

(3.17) (V'i/2)*(^) + ((r)i/2)*(^) = I^P • 
Now let a be given by (3.12), and define 

4>{v) = ^ (V'1/2) (av) and = ~ ((V'*)i/2) (av) ■ 

Then, V^#Go = Fo and V(^#Go = -Fi, and from (3.17), 

(3.18) (l){v) + 4>{v) = a\vf . 

To use this, observe that for any dual pair of convex functions rf and 77*, 
Young's inequality say that rj{v) + rf{w) > v ■ w. Hence for all v and w, 

^ \ |2\1||2|1||2 / \ */ \ 

-\v — w\ > -\v\ + — ri[v) — rj [w) . 

Now if G is any element of £, and 70 is the optimal coupling between G and 
Fo, we have 

(3.19) Wi{G,Fo) = I ]-\v~w\^^Q{dv,dw) 

> de- ( ri{v)G{v)dv- [ r}*{w)Fo{w)dw. 

In the same way, we deduce that for any other dual pair of convex functions 
and C*, 

(3.20) VF|(G,Fi) = / -\v -wfji{dv,dw) 

> d0- f C{v)G{v)dv- [ C{w)Fi{w)dw . 
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We now choose ij = cp and C, = <p. Then adding (3.19) and (3.20), and on 
account of (3.18), 

(3.21) a>(G) = Wi{G,Fo) + Wl{G,Fi) 

> 2de- J ^ [(j){v) + ^(i;)) G{v)dv 

— / 4>*{w)Fo{w)dw — / 0*('i«)Fi(v)d'u; 

jRd jRd 

= {2-a)de- [ (p*{w)Foiw)dw- [ ^*{w)Fi{v)dw . 

Now suppose that G = Go- Then for 70-almost every {v, w), we have that 
V ■ w = (p{v) + (j)*{w) so that 

- = ^\vf + ^\w\^ - (t){v) - (f)*{w) 

and hence there is equality in (3.19) when G = Gq and rj = (j). In the same 
way, there is equality in (3.20) when G = Gq and ( = ^. Thus, the lower 
bound in (3.21) is saturated for G = Gq, and is in any case independent of G. 
This proves that Gq is the minimizer. 

It is now easy to compute the minimizing value. Theorem 3.1 tells us 
that Go{v) = a'^Fii2{av) where a depends only on W2{Fq,Fi), and is given 
explicitly by (3.12). Then, with this choice of a, 

-VVi/2#^o = Go . 

Expressing this directly in terms of tp and computing in the familiar way, one 
finds 

Wi{Fo,Fi) 



(3.22) wi{Fo,Go) = - 

a 



(a - 1) + 



2de 



de{l - a) 



Clearly, W2{Fq,Gq) = W2{Gq,Fi), and so doubling the right-hand side of 

(3.22) and inserting our formula for a, we obtain (3.13). Finally simple calcu- 
lations confirm that / is increasing and convex on [0, 1]. □ 

We are now prepared to consider discrete approximations to geodesies in £. 
Let Q be the set of continuous maps t Gt from [0, 1] to £ with Go = Fq and 
^1 = ^1. 

For each natural number k, let Qk{FQ,Fi) denote the set of sequences 

(3.23) {Go,Gi,...,G2.} 
where each Gj is in Go = .Fq, G2fc = -Fi, and finally 

(3.24) W|(G,-+2, G,+i) = Wi{Gj+,, Gj) 
for all J = 0,1,..., 2*^ -2. 
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For any path t i-^ Gt in Q and any k, we obtain a sequence in Qki^o^ Pi) 
by an appropriate selection of times tj and by setting Gj = G{tj). 

We next obtain a particular element {F^'^\f^''\ . . . ,-^2^^} of ^^(-^05-^^1) 
by successive midpoint projections onto £ as follows: For A; = 1, let 
and ^2^' = Fi as we must. Define F^^^ to be the midpoint of the chordal 
geodesic from Fq to Fi, projected onto £ as in Theorem 3.3. Then, supposing 
{F^''\Fi''\...,F^t^} to be defined, put F^'^^^'^ = F^^ for j = 0,1,..., 2^=. 

Also, for j = 0, 1, . . . , 2^ — 1, let f!^-^^'' be the midpoint of the chordal geodesic 
from F^^^ to -Fj+i, projected onto £ as in Theorem 3.3. 

Lemma 3.4 (Discrete geodesies). For all k > 1, 

j=0 j=0 

for any {Go, Gi, . . . , ^2*=} in Qi^{Fq, Fi), and there is equality when and only 
when 



Proof. By condition (3.24), 
(3.25) E W^^G,, G,+,) = E 



9— fc 

j=0 \ 3=0 



We now claim that 

'j:wi{F^'\F^%) < eV|(G,,G,+i) 

j=o j=o 

and there is equality exactly when {Go, Gi, . . . , G2fc} = {Fq''\Fi''\ . . . , F^^^}. 
On account of (3.25), once this is established, the proof is complete. 

For k = 1, this is implied by Theorem 3.3. For A; > 1, consider any 
2^^ + 1-tuple {Go, Gi, . . . , G2fc} of elements of £. We are not requiring 
{Go, Gi, . . . , G2fc} G Qk- The point is that we are going to reduce to the case 
A; = by successively erasing every other element. Even if W2{Gj,Gj+i) = 
W2{Gj-^-l,Gj-^-2) for all j, it is not necessarily the case that W2(G^i) ^^+2) = 
W2{Gj-^-2,Gj-^-4) for all j, so that the procedure of "erasing midpoints" does 
not take us from Qk to Gk-i 

Nonetheless, without assuming that {Gq, Gi, . . . , G2fc} G ^fe, we have from 
Theorem 3.3, with / given by (3.13), that 
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(3.26) 

> E fiWi{G2t,G2,+2)) 

2' 



2'"M2^ E fiW!iG2i,G2i+2)) 



t=0 



2 



k-l 



> S'^-V i E Wi{Gn,Gn+2) 

where the last inequaUty is the convexity of /. 

Notice that both inequahties are saturated if and only if for each G2e+i 
is the projected midpoint of the chordal geodesic connecting G2e and G2^+2- 

The proof is now easy to complete. Define a sequence {Aj} inductively 
hyAo = Wi{Fo,Fi) and 



(3.27) Aj+, = 2^fi2-^AA . 



■■■■,-t'2k Si 



2k 



Because these inequalities are saturated for {Go, Gi, . . . , ^2*;} = {Fq ,F[ , 



j=0 

But a simple induction argument based on (3.26) shows that 

2* 

El^|(G,,G,+i)>Afc 

with equality only in the stated case. □ 

We can now define the distance W2 (-Fo ,Fi) onS induced by the 2-Wasserstein 
metric: 

(3.28) ^2(^0, Fi) = hm E W^2(Ff ^ , F^l\) 

where clearly the sequence on the right in (3.28) is increasing. In fact, Lemma 
3.4 tells us that the geodesic from Fq to Fi on £ is obtained by the following 
simple rule: Take the chordal geodesic t ^ Ft from Fq to Fi in V, and rescale 
each Ft onto £ as in Theorem 3.1. Then reparametrize this path in £ so that 
it runs at constant speed. This is the geodesic. Note that this same procedure 
produces geodesies on the sphere S"^~^ in W^. 
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It is now an easy matter to compute the distance }V2{Fo, Fi)- One way 
is to compute limk^^Ak for the sequence given by = l^K-^Oi-P^i) and 
(3.27). This is straightforward; it is easy to recognize the iteration as the same 
iteration one gets by dyadically rectifying an arc of the circle. 

We find it more enlightening to obtain an explicit parametrization of the 
corresponding geodesic, and to use the Riemannian metric for the 
2-Wasserstein distance. 

To begin the computation, let tp be the convex function such that Vip^Fo 
= Fi. We may assume without loss of generality that u = 0; this will simplify 
the computation. Then define Ft as in (2.6) and (2.7), and let Ft be the 
projection of Ft onto S as in Theorem 3.1. Since u = 0, 

a(t) ) 

where ■^j is defined in terms of as usual and where 
a{t) = i/l-4t(l-i)- 



,W^|(^o,i^i) 



Notice that the gradient vector field on ^ that represents the tangent vector 
dFt/dt has two terms: One is a rescaling of the gradient vector field on that 
represents dFt/dt, and the other generates a dilation to keep the path on 8. 

Next, we have from Theorem 2.3 that for any test function x oii ^'^i after 
some computation, 

d 



d_ 



Ft{v)dv 



Vx{v) 



1 



Ft{v)dv 



where rjt is given by (2.21) 

^dFt dFt\ 



a{t) a{t) 
Hence, from (2.25), we have 



V I Ftiv)dv , 



9 



dt ' dt 



1 



ait) 



Vr]tiait)v) 



a(t) 
a{ty 



Ft{v)dv 



2a? [t) 



a(t) 



Ft(v)dv . 



By (2.23), /jjd \V^t(v)\^Ft{y)dv = 2T^|(Fo,Fi), and clearly /^d \v\' Ft{v)dv = 
a?'{t)d9. Finally, by Theorem 2.3 and familiar computations. 



/ {"^Vtiv) ■ v) Ft{v)dv 

" (|v(Vt)*(i 



1 

2t 



|2 I I |2 

v\ + \v\ 



\vi4>tnv)nFtiv)dv 



= ^ (2Wi{Fo,Ft) + ia\t) - l)de) = {2t - l)W|(Fo,i^i) • 
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Putting all of this together, one has, after some algebra, 



m m 

dt ' dt 



1 



2a? {t) 



2Wf(Fo,i^i) + 



d(t) 
a{t) 



a^{t)de 



2^(2t-l)W^|(Fo,Fi) 
1 



1 



2d9 



Now we reparametrize to achieve constant unit speed. We take the map 
t<-^T{t) to be difFerentiable and increasing. Then with Fj- = F^[t)i 

2 



(3.29) 
provided 

This is solved by 



9 



dFr dFr 



9 



(dFt 


dFt\ 


dt 


[ dt ' 


dt 


dr 



Wi{Fo,F,) 



dt 



a\t) 



'1- 



2de 



for which r(l/2) =0 and 



(3.30) m(F„,FO = .(1) - .(0) = 2^f arctan [j^^^l^^^ . 

This has a very simple interpretation: Consider two points on a circle of radius 
R, and let D be the length of the chord that they terminate. The arc joining 
them subtends an angle 2^ where 



tan( 



£>2 



and hence the length of the arc joining them is 



(3.31) 



2i?arctan 



4i?2 - ^2 



Since ^J{dff)/2 is the radius Rq oi 8, in that this is the 2-Wasserstein distance 
from any point in £ to the unit mass at u, and since W2{Fq, Fi) is the chordal 
separation of Fq from Fi in the 2-Wasserstein distance, we have that (3.31), 
with R = ^/{de)/2 and D = W2{Fq,F{), gives us W2(Fo,^i)- It is somewhat 
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simpler to express this in terms of sines instead of tangents. From (3.31) it is 
easy to deduce that 

(3.32) W,{F,,F,) = 2Re^n{p^^) , 

(3.33) >V2(Fo,Fi) = 2i^garcsin ( ^'^^^^^'^ ) • 
We summarize this in the fohowing theorem: 

Theorem 3.5 (Geometry of £). Let W2(-Fo,Fi) denote the distance 
between any two points Fq and Fi of £ in the metric induced on £ by the 
2-Wasserstein metric. Then W2(-Fb) Fi) is related to W2(-Fo, Fi) through (3.32) 
and (3.33). Moreover, the geodesic on £ between Fq and Fi is obtained from 
the chordal geodesic in V between Fq and Fi by the following procedure: Let 
t I— > Ft, t G [0,1], denote the chordal geodesic. Then, for each such t, let Ft 
denote the unique point in £ that is closest to Ft, which is simply obtained from 
Ft by dilating about the mean u. This path, reparametrized to run at constant 
speed, is the geodesic on £ between Fq and Fi . 

This theorem strongly encourages one to think of £ in spherical terms, 
though we see from (3.10) that the chordal distance between any two points 
on £ is no more than \/2 times the radius of £, as given by (3.2), as on the 
spherical cap with the azimuthal angle (p ranging over < ^ < 7r/4. 

We apply this to deduce a criterion for displacement convexity on the 
constrained manifold £. We say that a functional $ is displacement convex on 
£ in case for all geodesies t i-^ Gt in £ , the function t i— > $(Gt) is convex. If 
the gradient vector field Vr/ on is the tangent vector at t = to a geodesic 
1 1-^ Gt in £ , we define 

(3.34) Hess^GQ){V7j,Vij) = ^<^{Gt) . 

at t=o 

This should be compared with (2.26). The differences lie in the different classes 
of geodesies being considered in the two cases, as well as the fact that 

(3.35) / V ■ Vri{v)Go{v)dv = and / Vr]{v)Go{v)dv = 

must hold for Vr/ to represent a tangent vector to f at Go- 

Since we have determined the geodesies in £, it is now a simple matter to 
determine a criterion for displacement convexity in £. 

Theorem 3.6 (Displacement convexity in £). Let G ^{G) be any 
functional of the form 

$(G)= I g{G{v))dv 
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where g is twice continuously differentiable on M_)_. Define the function h by 
h{t) = tg'{t) — g(t). Suppose that F G S^^ is such that h{F) is integrable, and 
that at F, 

is continuous in the 2- Wasserstein m,etric for all test functions rj. Then 
(3.36) ness<^{F){Vr],Vr]) = Hess $(F)(V??, V??) 



+ 



d 

2m 



h{F)dv 



\Vr)fFdv , 



where Rq = y/dF/2 is the radius of£u,e, and Vry is any gradient vectorfield sat- 
isfying (3.35) In particular, if^{F) = S{F) is the entropy f^d^Ta.{F{v))F{v)dv 
ofF, 

d 



(3.37) n ess S{F) ( Vr/, Vr/) = Hess S{F) {Vrj, Vry) + 



2m 



|Vr/|^Fdv , 



and thus the entropy is uniformly convex on the constrained manifold £u.e- 

Proof. WithoTit loss of generality, suppose u = 0. For any F E £, let 
i 1-^ be a geodesic in £ passing through F with unit speed at t = 0. Pick 
S > sufficiently small that Gs and G^s are both defined. By definition 
W^G-s, Gs) = 4(5^ Define h>Ohy Wi{G^s, Gs) = 4 h'^. By Theorem 3.5, 



(3.38) 



h 



i?esin( — ) =6 + 0{5^' 



Now let t Gfhe the chordal geodesic, in V, from G^s to Gs parametrized 
so that G-s = G-h and Gs = G^. By Theorem 3.3, Go = -F is obtained from 
Go by dilation: 

(3.39) Gq{v) = a'^Goiav) 

where 



(3.40) 

Now 
(3.41) 
1 



U^Gs)+HG_s))-HGo 



h^ 



$(Go)-$(Go) 



1 

h^ 



($(Gft)+$(G_0)-$(Go) 



5^' 



Next, since $(Go) - $(Go) = a'^ J^a g{a~'^F{v))dv - Jj^a g{F{v))dv, it follows 
from (3.40) and the definition of h that 

$(Go) - $(Go) d 



(3.42) 



lim 



(52 



2i?; 



■ [ h{F{v))dv. 
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By (3.38), the continuity of Hess$ at F and our previous definitions, 



lim TIT 



2 



= Hess$(F) . 



Combining this, (3.42) and (3.41), we obtain (3.36) from which the rest of the 
result easily follows. □ 

As an application, we deduce a strengthened form of an inequality due 
to Talagrand [21]. Let Go be a Gaussian density in Eq^u- Let F be any other 
density in Ee^u- Let Fg be the geodesic in £g,m parametrized by arclength, 
starting at F and going to Gq. Then by (3.37), 

S{F) - S{Gq) = / S'{Fs)ds 
Jo 

rm{Go,F) / fs \ I d ^ 

^S'iGo) + S"{Fr)dr) ds > 2 ^W|(Go, F) . 

We have used the fact that S'{Go) = since S{F) > S{Go) by the entropy- 
minimizing property of Gaussians. Also, since both F and Gq lie in <Se,Mj 
S{F) — S{Go) = H{F\Go), the relative entropy of F with respect to Gq. There- 
fore, since R'^ = 2/{de), 

HiF\Go)>^WUGo,F) , 

which is Talagrand's inequality, except that here Wl (Gq, F) replaces the smaller 
quantity Wi{Go,F). 

4. The Euler-Lagrange equation 

For fixed h > 0, and a given density Fq G £$,u: we seek to minimize the 
functional 

V|(Fo,F) 



(4.1) I{F) = 



+ hSiF) 



subject to the constraint that F G £e,u- 

This functional is strictly convex and our constraints are convex, and 
hence if any minimizer does exist, it would also be unique. The existence issue 
will be settled in the next section. Here we shall derive the Euler Lagrange 
equation that would be satisfied by any minimizer in our variational problem, 
and derive some consequences of satisfying this equation. 

Theorem 4.1. Suppose that Fi is a minimizer of the functional given in 
(4.1) subject to the constraint that F\ has the same mean and variance as Fq. 
Let tp be the convex function on such that 



(4.2) V^#F, 



1 
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Then 

(4.3) / |VlnFi|2Fi(u)d?; < oo 

and 

'Wi{Fi,Fo) 



(4.4) Vi;{v) = v + heV^ [In^] +{u-v) 



de 



where for any F £ V, Mp denotes the isotropic Gaussian density with the 
same mean and variance as F . 

Proof. Consider a function ^ : M*^ — s- R*^ satisfying 

(4.5) / i{v)Fi{v)dv = Q and / {^{v) ■ v)Fi{v)dv = . 

Then define the flow Tt{v) = v + t^{v) and the curve of densities G{t) = Tf^Fi. 
Finally, let G{t) be the projection of G{t) onto £ as in Theorem 3.1. Let u\ 

and dOi be the mean and variance of Fi. Then by Theorem 3.1, G(t,v) = 
a{tfG{t,a{t){v - u{t)) +ui), where, by (4.5) 

(4.6) a{t) = 1 + Oit"^) and u{t) = m + ©(t^) . 

We can also write G{t) = fi#Fi where ft{v) = {v + tS, {v / a{t)) / a{t) . 

The argument here is adapted from the corresponding argument in [12]. 
First, consider the entropy. By direct calculation and (4.6), 



SiG{t)) - 5(G(0)) = -t [ Fi{v)V ■ ^{v)dv + 0{i 
limMM^^ = _/ f,(„)v.{(.)d„. 



and so 

5(G(0) - SiFi 

t-+o+ 

To compute the variation in the 2-Wasserstein distance, note that since 
ft#Fi = G{t), W o fi^#Git) = Fo. Thus 



WiiG{t),Fo) < \ [ \Vi^off\v)-v\''G{t,v)iiv 
= l f \Vi;-ft{v)\^Fi{v)dv 
< WiiFi,Fo) -t [ {Vi^-v)- CFi{v)dv + o{t) . 



Now it follows easily that 

WiiG{t),Fo)-W^{F,,Fo) 

We deduce that 



(4.7) hmsup W2{G{t),Fo)-W,{F,,Fo) < / _ ^^^^^^ . ^(^^^t; . 

t^0+ t iffid 

jdTicc that 

/ ({Vi;iv)-v)^^)-av)dv<-h [ Fiiv)V-av) 
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for all smooth and compactly supported ^ satisfying (4.5). Since these condi- 
tions are still satisfied if ^ is replaced by — ^, we have that 

/ ((VV'(^;) - v) ^) ■ av)dv = -h / Fi(^)V • av) 
for all smooth and compactly supported ^ satisfying (4.5). Hence 

(4.8) (^{V^p{v) - v) - hVFi{v)^ ={A + B{u - v))Fi{v) 

for some vector A and scalar B. It follows from this that (4.3) holds. 

Integrating both sides of (4.8) in v, one learns that A = 0. If one takes 
the inner product of both sides with {u — v), and then integrates, one learns 
den = Wi{Fi,Fo)/0 - dh since 

/ (VV(i;) - v) ■ vF^{v)dv = iy|(Fi, Fo) . 

Combining this and (4.8), we obtain (4.4). □ 

Now still assuming that the minimizer Fi exists, we ask what properties 
does F\ inherit from Fq? We shall show, using the fact that Fi satisfies the 
Euler-Lagrange equation (4.4) and (4.2), that Fi inherits some localization 
properties from Fq. Specifically, let ^ be a nonnegative, increasing convex 
function on with the property that limt_+oo C(0/^ = °° ^^'^ ^^^^ C(0) = 0- 
Suppose that 

(4.9) / C(h^l^)^o(^^)d'y = C < oo . 

This quantity provides a quantitative measure of the localization of |'upFo('u) 
in that 

/ \v\^Fo{v)dv < -^C , 

J\v\^>t C{t) 
and the right-hand side tends to zero as t increases. Here, we have used that 
i — > C(i)/t is nondecreasing. If we knew that Fi satisfied the same inequality, 
we would have a quantitative localization estimate on Fi. We shall see below 
that this is almost the case: The function C is modified slightly in passing from 
Fq to Fi. 

First, we need to explain where the original C comes from. We could take 
C(i) = (1 + i)^"*"^ if we assumed that Fq possessed more than second moments. 
Since we wish to make a statement about generic elements Fq of Su^g, we use 
a minor variant of a lemma of de la Vallee-Poussin, which says that for any 
probability density Fq with J^d \v\'^ FQ{v)dv < oo, there is a a nonnegative, 
increasing convex function on IR+ with the property that limt^oo Cit)/t = oo 
such that (4.9) holds, and finally, that ||C"||oo < 1- Everything up to the last 
condition is standard, though the usual construction of ( is such that C" is 
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a series of Dirac masses. We therefore sketch a short proof. Without loss of 
generality, we may suppose that u = and 9 = 1/d. 
Let 

X(t) = I Fo{v)<lv and ii{t) = ( \v\'^Fo{v)dv 

J\v\^>t J\v\'^>t 

so that 1 = Jjgd \v\'^FQ{v)dv = X{t)dt, and that 

POO 

(4.10) ^i{t) = / X{u)du + X{t) > V A(n) . 

n>t 

Here, we have used the layer cake representation theorem. Now define tk by 
to = and for k > 1, tj, = inf{t | //(t) < 2"^}. Since Fo[v)dv is absolutely 
continuous, /Lt(tfc) = 2~''. Then by (4.10), 

oo oo oo 

(4.11) 1 = Y: Ktk) = E E ^(^) = E 9in)X{n) 

k=l k=ln>tk n=l 

where g{Q) = and for all n > 1, g{n) = max{fc | tk < n}. Clearly, 
lim„^oo5('^) = oo and g{n + 1) > g{n). Next, set h{0) = and for n > 1, 
define h{n) recursively by h{n) — h{n — 1) = 1 if g{n) — g{n — 1) > 0, and 
h{n) — h{n — 1) = otherwise. Then 

n n 

h{n) = J2ihik) - h{k - 1)) < E(5(fe) " 9{k - 1)) = g{n) 

k=l k=l 

but also clearly lim„^oo h{n) = oo since g{n) must increase infinitely often. 

Now define h{t) for all t > by linear interpolation of h{n), and then 
define ^{t) = /q h{s)ds. Note that ("(t) is a continuously differentiable convex 
increasing function with ||C"||oo < Ij and limt^oo Ci't)/t = oo- Also, since ({t) 
is increasing and X{t) is decreasing. 



C{t)X{t)dt < E ^(^ + < E(l + 9{n))X{n) < 3 , 



° n=0 n=0 

where the last inequality follows from (4.11). Since J-^d C{\v\'^)Fo{v)dv = 

j^cmmt, (4.9) holds. 

We are now ready to prove the following: 

Theorem 4.2. Suppose Fq is any element ofSo^^i o'^^ suppose ip is a con- 
vex potential with Vip^Fi = Fq such that tp and Fi satisfy (4.4). Then there 
are a nonnegative, increasing convex function ({t) such thatlimt^oo C(^)A = oo 
and lie" I loo < 1) 0''>T'd a finite constant C, both depending only on Fq, so that 



C{a\w - u\'^)Fi{w)dw < C 
for some a depending only on h, W2{Fo, Fi), and 6. 
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Proof. Without loss of generality, we continue to assume that u = and 
= 1, and thus 

Vt/j{w) =aw + hV ln(Fi (w)) 

for some constant a > that is readily computed from (4.4). Now let ({t) 
be the increasing convex function provided by the variant of the de la Vallee- 
Poussin lemma. Then, v —> ({\v\) is convex and so, 

jRd. jRd 

= / C{\aw + hVln{Fi{w))\^)Fi{w)dw 

jRd 

> C{a\wf)Fi{w)dw + 2ha (' {a\wf)w ■ V Fi{w)dw 

JRd JRd 

= [ ({a\w\'^)Fi{w)dw - 2ha'^ [ (" {a\w\'^)\w\'^ ■ Fi{w)dw 

JRd jRd 

-2had [ C'(akP) • Fi{w)dw . 

jRd 

Since Jj^d • Fi{w)dw = 1, 

(4.12) / C{a\'wf)Fi{w)dw < [ C{\vf)Mv)dv + 2ha'^{l + d) , 

JRd jRd 

where we are using the fact that ||C"||oo < 1 and C'(i) < t when (" is the function 
provided by the above variant of the de la Vallee-Poussin lemma. □ 



5. Existence of minimizers 

To simplify the notation, we fix u = and 9 = 1 throughout this section. 
The main goal is to prove that a minimizer exists for (4.1). As explained in 
the introduction, it suffices to find a density Fi E £ and a convex potential ip 
with V^if^Fi = Fq such that the Euler-Lagrange equation (4.4) is satisfied. 

In this, we make essential use of the dual version of the variational char- 
acterization of the 2-Wasserstein metric. This says that for all Fq and F in £, 



(5.1) 

d - W|(Fo, F) = inf I / (l){v)Fo{v)dv 

{.JRd 

+ / tl;{w)F{w)dw 



(f){v) + ip{w) > V ■ w a.e. 



where 'almost everywhere' refers to the measure FQ{v)Fi{w)dvdw. Further- 
more, the minimizing pair, which exists, consists of a dual pair of convex 
functions. That is, we may assume that ^ and ijj are Legendre transforms of 
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one another. The gradients of the minimizing pair provide the optimal trans- 
port plans; i.e., V^#Fo = F and Vil^#F = Fq. A good reference for this is [3] 
or [8]. 

We shall assume strong assumptions on Fq G £, which we shall later 
remove; namely we suppose that Fq is supported in Br, the centered ball of 
radius R, and that on Br it is bounded below by some strictly positive number 
a. Then for any other density F in V, these hypotheses impose some regularity 
on the optimal map Vipij^F = Fq. In particular, 

(5.2) |VV'(v)| < R 

for all V, which means that is Lipschitz. 
Now define r]{t) by 



riit) 



+00 if t < 0, 
tint ift>0. 



w 



Then the Legendre transform r]*{s) of r]{t) is r]*{s) = e*^^. Wc shall use use 
the notation rj* throughout this section to emphasize the fact that we do not 
make much use of the specific form of 77 in our analysis; this point is discussed 
further at the end of the section. Then 

S{F)= fniF)dv, 
and for any dual convex pair of functions (j) and ^p, 

(5.3) I{F) > hS{F) + d- ( I (j){v)Fo{v)dv + / ij{w)F{w)d 

where I{F) is given by (4.1). Moreover, by Young's inequality, ri{t)+ri* (s) > st, 
and thus we have that for any a £ and any 6 € M, 

(5.4) r/(F) + "•^ + ^H72 + V>(^;) \ ^ a ■ w + h\w\^ /2 + ^{w) ^ _ 

y h j ~ h 

Integrating yields 
(5.5) 

hS{F) - [ ^HFHdw >h-hl n*{ "•^ + %IV2 + ^H \ . 

iiRrf 2 i]Rd y h J 

Therefore, introduce the functional 
(5.6) 

J(a, 6, ^) = d-( mFo{v)dv+h-h I rj* ( » " ^ + %l V2 + ^(^) \ _ 

2, y h, J 

Note that (p is bounded below and rj* is positive, and hence J(a, b, (p, ip) is well- 
defined. It then follows from (5.3), (5.5) and (5.6) that for any dual convex 
pair of functions (p and ijj, a e R"^ and any 6 G M, 

(5.7) I{F)>J{a,b,cP,4^). 
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We let U denote the set of all quadruplets (a, 5, (f), if)) where a G M'^, 5 G M, 
and (j) and -0 are a pair of dual convex functions with 

(5.8) (l){v) = oo for \v\> R . 

The reason for this last condition is that increasing <j) off of the support of Fq can 
only decrease ^ and hence increase J; so we may freely restrict our attention 
to such dual pairs; see [8] or [3]. This guarantees that (5.2) holds whenever 
(a, 6, (j), ip) G U. Indeed, since ip is determined by (j) through the Legcndre 
transform, J can be regarded as a functional of a, b and (f) alone. However, the 
notation with (p included as a variable is convenient for the exposition. 
As we will see below, 

(5.9) min{/(F) \F e £} = max{ J(a, b, (p, tp) \ [a, b, (p, ^) e U} . 

The parameters a and b will be seen to function as Lagrange multipliers guar- 
anteeing that at the maximum on the right, Fi = Vcp^F^ does belong to £. 

Theorem 5.1. There exists (oq, feo) ^Oj V'o) ^ ^ such that 

(5.10) J (ao, bo,(po, ipo) > J {a, b, (p, ip) 
for all (a, b,(p,ip) eU . Furthermore, if 

(5.11) F^iw) = in*)' l^^o-^ + ^oHV^ + ^oHj 

then Fi G £, 

(5.12) VV'o#i^i = Fo 
and 

(5.13) Vipoiw) =w + hV In(Fi) + hdw - W^{Fo, Fi) . 

Note that this gives us a solution of the Euler-Lagrange equation for the 
minimum of I{F) that we derived in the last section. And indeed, since r/(i) + 
r]*{s) = st with 

ao-w + bo\w\'^/2 + ipo{w) 

t = Fi and s = 

h 

with F = Fi, ip = ipo, there is equality in (5.4). By (5.12), there is equal- 
ity in (5.3) when F = Fi, tp = tJjQ and (p = (pQ. It follows that I{Fi) = 
J{aQ,bQ,(pQ,tpQ). Together with (5.7), this proves that Fi minimizes / on £. 
Thus Theorem 5.1 provides us with the minimizer of the original problem. The 
advantage of the J functional lies in the compactness properties of the dual 
convex pairs. 
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Proof. First, suppose that the maximizer (ao,bo,4>o,ipo) does exist. Ob- 
serve that for any real number A, (oq, bo,(f>o + A, ■00 — A) G U. Then by (5.10) 



^J(oo,6o,0o + A,V'o - A) 



= 

A=0 



and this clearly leads to 

(5.14) 1 = / (,*)' ( «o-^ + WV2 + ^o(^) \ _ 

jRd. y h J 

Hence we see that (5.11) docs define a probability density. 
Next, we shall see below that for some £ > 0, 

(5.15) / e^l"'l'Fi(u;)du; < oo . 
This implies that 

^,)^/ (^.y/a-^ + %iy2 + M^)\ ^ 

jRd \ h I 



is a differentiable function of a and b in some neighborhood of (ao,6o). As- 
suming this for the moment, ^ J(oo, b, 00) ^ 
that 

d f \w\'^ ^ , I ao ■ w + bo\w\'^/2 + tljo{w) 



= 0, and from this we have 

bo 

which means that Fi does indeed satisfy the variance constraint. In the same 
way, differentiating in a shows that Fi does satisfy the mean constraint. Thus, 
Fi G S. 

So far, the only variation made in (f)Q, and hence in i/'O) is a shift by an 
additive constant. We now let ( be any smooth function supported in the 
interior of Br, and define (pt = (po + Kj and let be the Legendre transform 
of (pt- While these are not a dual pair of convex functions since 0t may fail to be 
convex, it is nonetheless clear that for all sufficiently small t, J(ao, bo, 00, V'o) > 
J (ao, 6o, 0t , 0t) and thus 

-^J(ao,6o,0t,0t) =0. 
ctf^ t=o 

As in [10] limt^o('0t(^) ~ '^o{w))/t = — C(V'0o('"^)) and it follows that 
/ C{v)Fo{v)dv= [ C{'^Mw))Fi{w))dw, 

which means that V'0o#-Pi = Fq . 

The remaining part of the Euler-Lagrange equation follows from (5.11) by 
simple differentiation: 

(5.16) hVFi{w) = (ao + bow + V'0o(w^)) Fi{w) . 
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Hence hw ■ VFi{w) = (ao • w + ftol'ii'P + w ■ ^"^oiw)) Fi{w), and integrating 
both sides we obtain that 

a 

Even more simply, one sees by integrating (5.16) that oq = 0. Thus, provided 
the maximizer exists, and that (a, 6) 1— > J(a, 6, i?^0) V'o) is difFerentiable in a 
neighborhood of (ao,6o), we have that Fi ^ £, VV'o#-^i = -^O; and that the 
Euler-Lagrange equation (5.13) is satisfied. 

To show the existence of an optimizer, we begin by considering any 
(a, b, (p, ijj). We now seek an a priori lower bound on </>(v). Fix any v € Br at 
which (j) is differentiable. Then let wq = V(j){v). Since ^ and cj) are dual to one 
another, v belongs to the subgradient of at wq, and then by the convexity 
of ■0, for any w G M"^, ipiw) > iP{'Wq) + v-{w — wq). 

Then since tp is convex, and because of the mononicity of (rf)' and its 
specific form, we have that 

( ^■w + h\w\^/2 + i^{w) ^ 

>e^v{{^{wo)-v-WQ)/h){rf) — 1 • 

Integrating, and using (5.14), we see that h is negative, and obtain 

(5.17) 1 > exp ((V'(m^o) - V ■ wo)/h) exp ( ^^^^l^ j 
But (p{v) = —{ip{wo) — Wq • v) and so 

(5.18) ct>iv)>\^-h(l + ^ln^ l^l 



2\b\ V 2 \27rh 

Integrating against Fq{v), we obtain that 

|2 



(5.19) I ,ivmv)iv > ^ + -L _ , (1 + ^ ,„ (|L)) . 

Now consider -0 where 'ip{w) = (1 — h){\w\'^/2) + h[l — ((i/2) ln(27r)], so 



that 



The dual convex function of '0 is ^ where 

4>{w) = (|u;| V2(l -h))-h[l- (d/2) ln(27r)] . 

This does not satisfy (5.8), and hence (0, —1, 0, i/^) is not in U. However, define 
(f>R by (f>R{v) = (f){v) for \v\ < R, and by (f>R{v) = 00 otherwise, and 
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define tpR to be the dual convex function. Then (0, — 1, ^r, V'j?) is in U and 
J{0,—l,4>R,'tpR) > J(0,— 1,^,-0) since, as we have noted, increasing i/j off the 
support of Fq can only decrease the dual tp, and hence increase J. We denote 
by Jd{h) the finite real number J(0, —1, 0, ip), depending only on d and h. Since 
it is clear that 

sup{ J(a, b, (f), ip) I (a, b,(f),tp) eU} > Jd{h) , 
and we seek a maximizer of J, we need only consider (a, 6, 0, V') € W such that 

(5.20) J(a,fe,0,V) > Jd(^) • 

Furthermore, we may suppose that we have already optimized over + A and 
V' — A so that (5.14) holds. Then from the fact that {rf)' = rf , 

f d 
J{a,b,(t),t/j) = d- (j){v)Fo{v)dv + b h. 

jRd 2 

In light of this, and (5.20), 

(5.21) / <l){v)Fo{v)dv<-Jdih) + d{l + -)-h . 
Combining (5.19) and (5.21) we obtain after simplification that 

(5.22) -J^(h) + d(l + ^) > + ^ + ^hX^i^-Kh) - h\ In \b\ . 

Recalling that b is negative, it is clear that |6| cannot be too close to zero, 
for then the right-hand side becomes greater than 2. Also, |6| cannot be too 
large, since as |6| increases, the left-hand side tends linearly to — oo, while the 
right-hand side only does so logarithmically. Even more evidently, \a\ cannot 
be too large. 

It follows that there is a constant c > 0, depending on /t, so that 

(5.23) c<|6|<l/c and \a\ < c. 
Next, use (5.11) to define F\\ that is, 

, I a ■ w + b\w\'^/2 + tp{w)\ 



(5.24) Fi{w) = {rf)' 



h 



We may suppose without loss of generality that a and b have been chosen 
optimally so that Fi G S. Since J^a \v\'^Fi{v)dv = 1, 



1/2 < / Fi{w)dw < 1. 

J\w\<V2 
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This together with (5.2) and (5.24) means that for another finite constant C, 
(5.25) \ip{w)\ < C + R\w\ 

for all w. In particular, with Fi defined as in (5.24), (5.15) holds, as claimed. 

This gives all of the a priori estimates needed. Consider a sequence 
{{anjbnjCpnjfpn) £ , each of which satisfies (5.20). First we may optimize in 
a„ and 6„ and carry out the variation over + A and ipn — ^- With these 
chosen optimally, (5.14) holds. 

Then by the previous paragraphs, a„ and 6„ satisfy (5.23) for all n. Passing 
to a subsequence, we may assume that {a„} and {bn} converge to the limits 
ao and 6o respectively. 

Now for each n, define F^^^ in terms of a„, 6„ and V'n using (5.24) Our 
optimizing sequence is such that for each n, F^"'' G S, since, as we have seen, 
this is what is guaranteed by optimality in a and b. Moreover, since a„ and 
bn satisfy (5.23) for all n, it follows that (5.15) holds for all n for some fixed 
£ > 0. 

Passing to a further subsequence, we have that ipo = hm^^oo V'n exists 
uniformly on compact sets due to (5.25) and the Lipschitz bound. Since for 
each n, F^"^ satisfies (5.15), lim„_^oo -F'l"'' converges strongly in L^. 

It is plain that on Br, passing to a further subsequence if need be, we 
have lim^^oo 0n = almost everywhere and 




Thus J{ao,bo,(l)o,ipo) =liinn^oo J {an, bn,4>n,'4^n)- Since {(a„, V'n)} was 

a maximizing sequence, (oq, bo,(f)o, V'o) € W is the desired maximizer, and all of 
the properties of Fi and V'o claimed in the theorem have already been shown 
to be consequences of the corresponding Euler-Lagrange equations. □ 

Thus, under our given conditions on Fo, we have proved the existence of a 
minimizer Fi of I{F). Now consider an arbitrary clement Fq € £. Then there 
exists a convex function on M_|_ as in Section 4 such that C,{t)/t increases to 
infinity and 

/ (^{\v\^)FQ{v)dv = C < oo . 

We approximate Fq in L-'^(R'') by a sequence of densities F^^ such that 

/ C(l^l'W"^(^^)d^<2C 

for all n, and such that for each n, Fq"^ is supported in Br^ for some radius 
Rn- Let F]^^ be the corresponding minimizer of I{F). Then by Theorem 4.2, 
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there are numbers a > and < cx) so that 
(5.26) / C{a\v\'^)Fi"'\v)dv < K 

for all n. 

(n) 

By passing to a subsequence, we may suppose that ' converges weakly 
to a probability density F\. It is clear that the first moments converge, and by 
(5.26) it is clear that the second moments converge as well, and hence F\ G E. 
Moreover, since convergence in the 2-Wasserstein metric is equivalent to weak 
convergence and convergence of the second moments, lim„^oo 1^2^(-^i"\ -^i) 
= 0, and lim„_oo Wl{Ff' .Fq) = 0. Therefore, 

lin,^Wi{F,,Fo) = Wi{Fi''\Ft^). 

Finally, by weak lower semicontinuity, S{Fi) < liminfn,^oo S(Fi^"^). It follows 
that Fi is the minimizer we seek. 

(n) 

Then by dominated convergence, Fi = lim„^oo F^ £ £ and Fi is the 
desired minimizer. It is unique by strict convexity. Thus we have proven the 
following result: 

Theorem 5.2. For all Fq € £, there exists a unique Fi E £ such that 

I{Fi) < I{F) 

for all F £ £, where I{F) is as defined in (4.1). 

We note that on the basis of this result, there is a unique solution to the 
discrete time evolution problem in which, given initial data Fq £ £ and a time 
step h > 0, Fji is defined iteratively in terms of Fn-i by setting F„ to be the 
minimizer of 

u 

over £. We see easily, using the results of Section 4, that if we define F^^\t, v) 
by an appropriate interpolation as in [12], then liuih-^o F^^\t,v) = F{t,v) 
where F{t, v) solves the Fokker-Planck equation 

^^F{t,v) = V ■ {e-\--^\"/^'V{e\--^\"/^'Fit,v)) 

with initial data Fq. This equation is of course already well understood, but 
we shall show that this way of approaching it extends to the nonlinear spa- 
tially inhomogeneous kinetic Fokker-Planck equation, which is much less well 
understood, in a related paper. 
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Open problems. We close this section by commenting on two open prob- 
lems. First, consider the variational problem employed by Jordan, Kinderlehrer 
and Otto [12] to construct solutions of the heat equation: 

(5.27) mi{hSiF) + Wi{F,Fo)} 

in which no constraint is imposed on the variance of F. We conjecture that 

(5.28) / \v\'^Fi{v)dv > I \v\^FQ{v)dv 

where Fi is the minimizer for (5.27). We can prove this under several additional 
assumptions — when h is not too small, when Fq is radial, etc., and we note 
that if Ft solves the heat equation, 

(5.29) / \v\^FAv)dv = 2d 

dt Jwi 

for any initial data Fq with finite variance. In Section 3, we have given the 
exact solution of this variational problem, and we see similar behavior in that 
case. However, we have not been able to prove (5.28) in general. It would be 
most unfortunate if the discrete time problem did not possess a good analog 
of the basic montonicity property (5.29), and wc do not believe that this is 
the case. If (5.28) were true, it would make it easy to prove Theorem 5.2 by 
adding on a Lagrange multiplier A j'^a \v\^F{v)dv to the functional in (5.27). 
The existence (and uniqueness) of minimizers would follow by the argument 
in [12] for all A > 0. Let F^^^ denote the minimizer corresponding to a given 
value of A > 0. If (5.28) were true, it would be easy to show the existence of 
a value Aq > for which \v\^ F'^^'^\v)dv = J^d \v\'^ FQ{v)(iv . It would then 
follow that F('^o) is the minimizer provided by Theorem 5.3. 

Another open problem concerns the growth of higher moments. We note 
that if Ft solves the heat equation for any initial data Fq with zero mean and 
finite fourth moments, 

/ \v\'^Ft{v)dv = 12de . 

This leads one to hope that if Fi is the minimizer for (5.27), and Fq has zero 
mean and, say, finite sixth moments, there is a constant C depending only on, 
say, the sixth moments so that 

(5.30) / \vfFi{v)dv<{l + Ch) [ \vfFo{v)dv . 

This would be helpful in studying the nonlinear kinetic Fokker-Planck equation 
by these methods. We conjecture that this is true. We note that to prove 
(5.30), one needs an upper bound on the moments of the minimizer Fi, while 
to prove (5.28), one needs a lower bound. 

Georgia Institute of Technology, Atlanta, GA 
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